Tied biases vs untied biases

For a convolution layer you can choose to have either tied or untied biases. If you’re using tied biases that means that for each location in a feature map the same bias is applied.  (Currently, blocks only supports untied biases, but I made a PR that should support tied biases soon. )

Intuitively, I would say that it makes more sense to use tied biases, since the weights are also tied. However, it has been reported by Alexandre and Bart that untied biases train much faster. I can see that untied biases add extra capacity to the model, but I didn’t expect the difference to be that big.

In order to verify their results, I’ve run the same experiment as in the previous post with tied biases. The results below confirm indeed that untied biases lead to better performance for this architecture.

tied_vs_untied_error

That untied biases perform better might also be explained by the fact that we are underfitting. In fact, if we look at the error plot in the previous post, we see that even with untied biases we are not overfitting the problem since the validation error/nll flattens but does not increase. Using tied biases might be a good idea, after all, if we increase the capacity of the model somewhere else (e.g. increasing the number of feature maps).

2 thoughts on “Tied biases vs untied biases

Leave a comment